A near-optimal polynomial time algorithm for learning in certain classes of stochastic games
نویسندگان
چکیده
We present a new algorithm for polynomial time learning of optimal behavior in single-controller stochastic games. This algorithm incorporates and integrates important recent results of Kearns and Singh 5] in reinforcement learning and of Monderer and Tennenholtz 7] in repeated games. In stochastic games, the agent must cope with the existence of an adversary whose actions can be arbitrary. In particular, this adversary can withhold information about the game matrix by refraining from (or rarely) performing certain actions. This forces upon us an exploration vs. exploitation dilemma more complex than in Markov decision processes in which, given information about particular parts of a game matrix, the agent must decide how much eeort to invest in learning the unknown parts of the matrix. We present a polynomial time algorithm that addresses these issues in the context of the class of single controller stochastic games, providing the agent with near-optimal return.
منابع مشابه
A Near-Optimal Poly-Time Algorithm for Learning a class of Stochastic Games
We present a new algorithm for polynomial time learning of near optimal behavior in stochastic games. This algorithm incorporates and integrates important recent results of Kearns and Singh [ 1998] in reinforcement learning and of Monderer and Tennenholtz [1997] in repeated games. In stochastic games we face an exploration vs. exploitation dilemma more complex than in Markov decision processes....
متن کاملA Near - Optimal Polynomial TimeAlgorithm for Learning in StochasticGames
We present a new algorithm for polynomial time learning of optimal behavior in stochastic games. This algorithm incorporates and integrates important recent results of Kearns and Singh 5] in reinforcement learning and of Monderer and Tennenholtz 7] in repeated games. In stochastic games, the agent must cope with the existence of an adversary whose actions can be arbitrary. In particular, this a...
متن کاملR-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning
R-max is a very simple model-based reinforcement learning algorithm which can attain near-optimal average reward in polynomial time. In R-max, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The model is initialized in an optimistic fashion: all actions in all states return the maximal possible...
متن کاملNear-Minimum-Time Motion Planning of Manipulators along Specified Path
The large amount of computation necessary for obtaining time optimal solution for moving a manipulator on specified path has made it impossible to introduce an on line time optimal control algorithm. Most of this computational burden is due to calculation of switching points. In this paper a learning algorithm is proposed for finding the switching points. The method, which can be used for both ...
متن کاملTwo-stage stochastic programming model for capacitated complete star p-hub network with different fare classes of customers
In this paper, a stochastic programming approach is applied to the airline network revenue management problem. The airline network with the arc capacitated single hub location problem based on complete–star p-hub network is considered. We try to maximize the profit of the transportation company by choosing the best hub locations and network topology, applying revenue management techniques to al...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Artif. Intell.
دوره 121 شماره
صفحات -
تاریخ انتشار 2000